Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: manage Notifier storage usage [DHIS2-17998] #19738

Merged
merged 20 commits into from
Jan 23, 2025
Merged

feat: manage Notifier storage usage [DHIS2-17998] #19738

merged 20 commits into from
Jan 23, 2025

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Jan 21, 2025

Summary

PR improves the automatic and manual management of the amount of storage allocated by the Notifier (/api/system/tasks and /api/system/taskSummaries) API. In that context multiple new system settings were introduced to tune the automatic caps and cleanup behavior.

New System Settings

  • notifierLogLevel: what level of messages to include in the log/list, default DEBUG (all)
  • notifierMaxMessagesPerJob: each job can at most have this amount of messages in its list (soft enforced allowing momentary exceeding the limit by a few); default is 500
  • notifierMaxAgeDays: job data older than this number of days is discarded (soft enforced, cleanup after 1 minute of idle); default is 7
  • notifierMaxJobsPerType: if per JobType there are more than this number of jobs with data the oldest are discarded to get below this limit (soft enforced, cleanup after 1 minute idle); default is 500
  • notifierGistOverview: when true the overview pages will only show the first and last message of the list for each job; default true
  • notifierCleanAfterIdleTime: the time in milliseconds the notifier has to be idle (not moving messages from queue to store) before an automatic store cleanup is run using the notifierMaxAgeDays and notifierMaxJobsPerType as caps; default is 60sec

New API Endpoints

  • DELETE /api/system/tasks: delete all data (of all jobs of any type)
  • DELETE /api/system/tasks/{job-type}: delete all data of of all jobs of that type
  • DELETE /api/system/tasks/{job-type}/{job-id}: delete all data of the job

When not just deleting data for a specific job (first 2 URLs) 2 optional parameters can be used:

  • maxAge: limits the deletion to jobs that are at least of that age in days, e.g. maxAge=5 to delete all job data for jobs older than 5 days
  • maxCount: limit the deletion so at most data for this number of jobs is kept, oldest jobs are deleted; e.g. maxCount=10 to keep data for the 10 jobs which have the most recent data

"All data" always means both notifications and summary. Therefore there is no equivalent deletion with /api/taskSummaries.

Asynchronous Notifications

The new implementation of the Notifier uses a BlockingQueue to decouple the sources of Notification messages from the storage part. This has several advantages, but mainly was needed to know that adding messages to the store is never concurrent to other writes or cleanup. That simplifies a lot how the operations can be implemented. As a bonus it should also improve performance of jobs slightly. In addition exceptions from notification and storage will never escalate to the job thread.

The single thread processing also allowed to make sure all messages will always have an increase of at least 1ms
in their timestamp and that all message timestamps reflect the actual store insert order. Timestamps are only adjusted when needed.

Storage Changes

  • the JSON of a Notification itself got reduced since it contained multiple fields that can be recovered from context
  • the summary is stored as JSON and only wrapped in JsonValue - it is never converted back to its object model (just to be turned into JSON again in the API layer) which has less overhead and less issues with changes, class renames, moves etcetera.
  • in Redis only the notifications: key namespace is used to store the list of notifications (age of collection is extracted from most recent message)
  • in Redis only the summaries: key namespace is used to store the summary as JSON (to remember the time the value is wrapped and unwrapped in another object, but the code can detect and handle existing values without the wrapper)
  • A NotifierStore layer was introduced which handles the difference between in-memory and redis. The Notifier implementation itself is now common code. This allowed to keep each of the operations done in redis or in-memory so simple that it is easy to see that they are doing the right thing. This should help a lot in them both behaving the same.

Other Changes

  • with this change all jobs will at least forward their first and last message to the Notifier (this is also to ensure that any summary added always has notifications - this simplifies the cleanup)
  • the data value import job was still using the Notifier API directly in the service causing chaos with the messages where it would start and end twice and discard some of them. This was moved to the job and using JobProgress. A cleanup will be done in another PR.

Automatic Testing

Added new tests for the Notifier API including the added methods. The tests run with both InMemoryNotifierStore and RedisNotifierStore but using a FakeRedis. The FakeRedis itself is also tested directly to make sure it is a proper substitute for the real one.

Manual Testing

...

@jbee jbee self-assigned this Jan 21, 2025
@jbee jbee changed the title feat: manage notification storage usage [DHIS2-17998] feat: manage Notifier storage usage [DHIS2-17998] Jan 22, 2025
}

@Test
void testInsertingNotificationJobConcurrently() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This test is no longer possible without waiting a minute so I removed it.

@jbee jbee marked this pull request as ready for review January 23, 2025 12:38
@@ -192,6 +192,9 @@ public boolean isDefaultExecutedByCreator() {
}

/**
* @implNote since 2.42 all jobs forward to the {@link org.eclipse.emf.common.notify.Notifier} but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the DHIS Notifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops

*
* @author Jan Bernitt
*/
record FakeRedis(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was using a Redis test container looked into? Any specific reasons why this would be preferable over something that would more closely mirror actual use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I did not. I wanted something that can be used in unit tests so the tests run fast and don't add in another level of complexity.

Copy link
Contributor

@david-mackessy david-mackessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very good & useful improvement 👍

Copy link

@jbee jbee merged commit 77a9789 into master Jan 23, 2025
17 checks passed
@jbee jbee deleted the DHIS2-17998 branch January 23, 2025 17:19
jbee added a commit that referenced this pull request Jan 29, 2025
* feat: gist for overview lists, limit setting [DHIS2-17998]

* refactor: NotifierStore implemented

* feat: clear and cap API and endpoints

* feat: cap for redis store

* fix: cap and clean consistency, log level filter for scheduling only

* chore: API cleanup, javadoc, some test fixes

* fix: notifier tests

* fix: hide transient empty in-memory stores in the API

* fix: update test assert with new message

* fix: remove notifier stubbing from mock test

* fix: e2e test assert for updated message

* fix: mock test setup, dependencies

* fix: add jackson core back in

* fix: exclude jackson core from dependency check

* fix: sonar issues

* test: notifier API tests for in-memory and redis

* fix: sonar warnings

* chore: fix typo
jbee added a commit that referenced this pull request Jan 30, 2025
* feat: manage Notifier storage usage [DHIS2-17998] (#19738)

* feat: gist for overview lists, limit setting [DHIS2-17998]

* refactor: NotifierStore implemented

* feat: clear and cap API and endpoints

* feat: cap for redis store

* fix: cap and clean consistency, log level filter for scheduling only

* chore: API cleanup, javadoc, some test fixes

* fix: notifier tests

* fix: hide transient empty in-memory stores in the API

* fix: update test assert with new message

* fix: remove notifier stubbing from mock test

* fix: e2e test assert for updated message

* fix: mock test setup, dependencies

* fix: add jackson core back in

* fix: exclude jackson core from dependency check

* fix: sonar issues

* test: notifier API tests for in-memory and redis

* fix: sonar warnings

* chore: fix typo

* chore: prevent null message logging + log cleanup [DHIS2-17998] (#19800)

* fix: maven dependencies + mock tests

* fix: add Long conversion

* fix: e2e test message expectations after change in wording
jbee added a commit that referenced this pull request Jan 30, 2025
* feat: manage Notifier storage usage [DHIS2-17998] (#19738)

* feat: gist for overview lists, limit setting [DHIS2-17998]

* refactor: NotifierStore implemented

* feat: clear and cap API and endpoints

* feat: cap for redis store

* fix: cap and clean consistency, log level filter for scheduling only

* chore: API cleanup, javadoc, some test fixes

* fix: notifier tests

* fix: hide transient empty in-memory stores in the API

* fix: update test assert with new message

* fix: remove notifier stubbing from mock test

* fix: e2e test assert for updated message

* fix: mock test setup, dependencies

* fix: add jackson core back in

* fix: exclude jackson core from dependency check

* fix: sonar issues

* test: notifier API tests for in-memory and redis

* fix: sonar warnings

* chore: fix typo

* chore: prevent null message logging + log cleanup [DHIS2-17998] (#19800)

* fix: maven dependencies + mock tests

* fix: add Long conversion

* fix: e2e test message expectations after change in wording
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants